Changing the Transport

Similar to how you are able to change the serialiser by importing a new one and passing it to Dataset, you are also able to do this with Transport.

Use Cases

The main reason to swap transport, is if the default rsync does not work for your system. This can either be related to the remote machine, the connection, or an outdated version.

Important

remotemanager requires rsync --version >= 3.0.0. MacOS devices may run an outdated version. To fix this, you can either update your install (slower, but permanent fix), or swap to scp (fast, but is required for each Dataset).

Even if you have no issues, it is possible to customise the transport further by setting Flags directly. This is an alternative method to that shown in the flags tutorial.

Importing

Just like with serialdill, serialjson, etc., you may import from the available Transport methods:

  • rsync

  • scp

  • cp

Of these, cp is less useful as it is unable to connect to external machines. It is provided for the edge case where you require no remote connection and the machine has no rsync or scp. And to provide a very simple template for creating your own Transport.

To start, we can set up a run just as normal. The transport is a drop in replacement, having no effect on the Dataset other than the command that actually gets used to send/retrieve data.

[1]:
from remotemanager import Dataset
[2]:
def function(x, y):
    return x * y

Since rsync is default, lets swap to scp

[3]:
from remotemanager.transport import scp
[4]:
ds = Dataset(
    function,
    skip = False,
    transport = scp(),  # new option!
)

Note

Like the serialiser, and URL, the transport object must be instantised (“called”) at some point post-import.

[5]:
ds.append_run({"x": 21, "y": 2})
ds.run()
ds.wait(1, 10)
appended run runner-0
Staging Dataset... Staged 1/1 Runners
Transferring for 1/1 Runners
Transferring 5 Files... Done
Remotely executing 1/1 Runners
[6]:
ds.fetch_results()
ds.results
Fetching results
Transferring 2 Files... Done
[6]:
[42]

Verification

Right now, it looks like nothing has changed, we have to do some digging to see if it worked.

A quick way is to check the transport property

[7]:
ds.transport
[7]:
<remotemanager.transport.scp.scp at 0x7fd51869f310>

That reads, scp, so it’s the right module at least. But we want to see some commands. Lets search the cmd_history for commands containing scp:

[8]:
for cmd in ds.url.cmd_history:
    if "scp" in cmd.sent:
        print(cmd.sent)
        break
scp -r /home/test/remotemanager/docs/source/tutorials/temp_runner_local/{dataset-991e1c92-master.sh,dataset-991e1c92-repo.py,dataset-991e1c92-repo.sh,dataset-991e1c92-runner-0-jobscript.sh,dataset-991e1c92-runner-0-run.py} temp_runner_remote/

And there we have our first scp call, sending data from local to remtote dirs.

Flags

As mentioned at the top, it is possible to directly set the flags of the transport at the initialisation, using the flags keyword:

[9]:
ds = Dataset(
    function,
    skip = False,
    transport = scp(flags="-v"),  # new option!
)
[10]:
ds.transport.flags
[10]:
-v

Custom Transport

Just like with Serialiser, it is possible to create your own transport class.

This can be done by subclassing the transport module and adding the necessary overrides (usually just the cmd method).

cmd

When overriding the cmd method of Transport, there is a pattern to follow.

The docstring of the base level method explains this in detail. Found here.

But in short, the function should return a valid command in string form, and accept two arguments primary and secondary. These are both strings.

primary

This argument will come “preformatted” in bash-syntax. For example directory_name/{file1,file2,file3,...,fileN}